Transliterated Mobile Keyboard Input via Weighted Finite-State Transducers

نویسندگان

  • Lars Hellsten
  • Brian Roark
  • Prasoon Goyal
  • Cyril Allauzen
  • Françoise Beaufays
  • Tom Ouyang
  • Michael Riley
  • David Rybach
چکیده

We present an extension to a mobile keyboard input decoder based on finite-state transducers that provides general transliteration support, and demonstrate its use for input of South Asian languages using a QWERTY keyboard. On-device keyboard decoders must operate under strict latency and memory constraints, and we present several transducer optimizations that allow for high accuracy decoding under such constraints. Our methods yield substantial accuracy improvements and latency reductions over an existing baseline transliteration keyboard approach. The resulting system was launched for 22 languages in Google Gboard in the first half of 2017.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segmenting Sequences Semantically. Using Petri Net Transducers for the Translation from Sequential Data to Non-Sequential Models

In previous work we presented an extension and generalisation of finite state transducers (FSTs) to so-called Petri net transducers (PNTs). These are applicable to any form of transforming sequential input signals into non-sequential output structures – which can be used to represent the semantics of the input – by performing a weighted relation between partial languages, i.e. assigning one wei...

متن کامل

Mobile Keyboard Input Decoding with Finite-State Transducers

We propose a finite-state transducer (FST) representation for the models used to decode keyboard inputs on mobile devices. Drawing from learnings from the field of speech recognition, we describe a decoding framework that can satisfy the strict memory and latency constraints of keyboard input. We extend this framework to support functionalities typically not present in speech recognition, such ...

متن کامل

Urdu - Roman Transliteration via Finite State Transducers

This paper introduces a two-way Urdu– Roman transliterator based solely on a nonprobabilistic finite state transducer that solves the encountered scriptural issues via a particular architectural design in combination with a set of restrictions. In order to deal with the enormous amount of overgenerations caused by inherent properties of the Urdu script, the transliterator depends on a set of ph...

متن کامل

Automatic Sanskrit Segmentizer Using Finite State Transducers

In this paper, we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated with each of them. We followed two different approaches to segment a Sanskrit text using sandhi1 ...

متن کامل

Efficient Algorithms for Testing the Twins Property

Weighted automata and transducers are powerful devices used in many large-scale applications. The efficiency of these applications is substantially increased when the automata or transducers used are deterministic. There exists a general determinization algorithm for weighted automata and transducers that is an extension of the classical subset construction used in the case of unweighted finite...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017